CAPS: A Cross-genre Author Profiling System

نویسندگان

  • Ivan Bilan
  • Desislava Zhekova
چکیده

This paper describes the participation of the Cross-genre Author Profiling System (CAPS) in the PAN16 shared task [15]. The classification system considers parts-of-speech, collocations, connective words and various other stylometric features to differentiate between the writing styles of male and female authors as well as between different age groups. The system achieves the second best score – 74.36% accuracy (with the best performing system (BPS) reaching 75.64%) for gender identification on the official test set (test set 2) for English. Further, for age classification, we report accuracy of 44.87% (BPS: 58.97%). For Spanish, CAPS reaches performance of 62.50% (BPS: 73.21%) for gender and 46.43% (BPS: 51.79) for age, while for Dutch, the accuracy for gender (the task did not target age) is lowest – 55.00% (BPS: 61.80%). For comparison, we also tested CAPS on single genre classification of author gender and age on the PAN14 and PAN15 datasets achieving comparable performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016

Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...

متن کامل

Cross-Genre Age and Gender Identification in Social Media

This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different g...

متن کامل

Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations

This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were e...

متن کامل

Profiling Microblog Authors using Concreteness and Sentiment - Know-Center at PAN 2016 Author Profiling

The PAN 2016 author profiling task is a supervised classification problem on cross-genre documents (tweets, blog and social media posts). Our system makes use of concreteness, sentiment and syntactic information present in the documents. We train a random forest model to identify gender and age of a document’s author. We report the evaluation results received by the shared task.

متن کامل

Cross-Genre Author Profile Prediction Using Stylometry-Based Approach

Author profiling task aims to identify different traits of an author by analyzing his/her written text. This study presents a Stylometry-based approach for detection of author traits (gender and age) for cross-genre author profiles. In our proposed approach, we used different types of stylistic features including 7 lexical features, 16 syntactic features, 26 character-based features and 6 vocab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016